Learning with Symmetric Label Noise: The Importance of Being Unhinged

نویسندگان

  • Brendan van Rooyen
  • Aditya Krishna Menon
  • Robert C. Williamson
چکیده

Convex potential minimisation is the de facto approach to binary classification. However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing. This ostensibly shows that convex losses are not SLN-robust. In this paper, we propose a convex, classification-calibrated loss and prove that it is SLN-robust. The loss avoids the Long and Servedio [2010] result by virtue of being negatively unbounded. The loss is a modification of the hinge loss, where one does not clamp at zero; hence, we call it the unhinged loss. We show that the optimal unhinged solution is equivalent to that of a strongly regularised SVM, and is the limiting solution for any convex potential; this implies that strong `2 regularisation makes most standard learners SLN-robust. Experiments confirm the unhinged loss’ SLN-robustness is borne out in practice. So, with apologies to Wilde [1895], while the truth is rarely pure, it can be simple. 1 Learning with symmetric label noise Binary classification is the canonical supervised learning problem. Given an instance space X, and samples from some distribution D over X× {±1}, the goal is to learn a scorer s : X→ R with low misclassification error on future samples drawn fromD. Our interest is in the more realistic scenario where the learner observes samples from some corruption D of D, where labels have some constant probability of being flipped, and the goal is still to perform well with respect to D. This problem is known as learning from symmetric label noise (SLN learning) [Angluin and Laird, 1988]. Long and Servedio [2010] showed that there exist linearly separable D where, when the learner observes some corruption D with symmetric label noise of any nonzero rate, minimisation of any convex potential over a linear function class results in classification performance on D that is equivalent to random guessing. Ostensibly, this establishes that convex losses are not “SLN-robust” and motivates the use of non-convex losses [Stempfel and Ralaivola, 2009, Masnadi-Shirazi et al., 2010, Ding and Vishwanathan, 2010, Denchev et al., 2012, Manwani and Sastry, 2013]. In this paper, we propose a convex loss and prove that it is SLN-robust. The loss avoids the result of Long and Servedio [2010] by virtue of being negatively unbounded. The loss is a modification of the hinge loss where one does not clamp at zero; thus, we call it the unhinged loss. This loss has several appealing properties, such as being the unique convex loss satisfying a notion of “strong” SLN-robustness (Proposition 5), being classification-calibrated (Proposition 6), consistent when minimised on D (Proposition 7), and having an simple optimal solution that is the difference of two kernel means (Equation 8). Finally, we show that this optimal solution is equivalent to that of a strongly regularised SVM (Proposition 8), and any twice-differentiable convex potential (Proposition 9), implying that strong `2 regularisation endows most standard learners with SLN-robustness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

On the Robustness of Decision Tree Learning Under Label Noise

In most practical problems of classifier learning, the training data suffers from the label noise. Hence, it is important to understand how robust is a learning algorithm to such label noise. Experimentally, Decision trees have been found to be more robust against label noise than SVM and logistic regression. This paper presents some theoretical results to show that decision tree algorithms are...

متن کامل

CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

In this paper, we study the problem of learning image classification models with label noise. Existing approaches depending on human supervision are generally not scalable as manually identifying correct or incorrect labels is timeconsuming, whereas approaches not relying on human supervision are scalable but less effective. To reduce the amount of human supervision for label noise cleaning, we...

متن کامل

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

The growing importance of massive datasets with the advent of deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling for large datasets, non-expert labeling, and label corruption by data poisoning adversaries. In the latter case, corruptions may be arbitrarily bad, even so bad that a classifier predicts the wr...

متن کامل

The effects of traffic noise on memory and auditory-verbal learning in Persian language children

Background: Acoustic noise is one of the universal pollutants of modern society. Although the high level of noise adverse effects on human hearing has been known for many years, non-auditory effects of noise such as effects on cognition, learning, memory and reading, especially on children, have been less considered. Factors which have negative impact on these features can also have a negat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015